PyDigger - unearthing stuff about Python


NameVersionSummarydate
mawo-razdel 1.0.2 Продвинутая токенизация для русского языка с SynTagRus паттернами и +25% точностью 2025-11-01 08:58:42
durak-nlp 0.2.3 Durak: modular Turkish NLP preprocessing toolkit. 2025-10-30 01:39:43
maze-dataset 1.4.1 generating and working with datasets of mazes 2025-10-17 12:31:39
acoustic-solo-dadaGP 0.1.0 Guitar tablature tokenization and processing toolkit. 2025-10-10 20:21:49
bidnlp 0.1.4 A Comprehensive Persian (Farsi) Natural Language Processing Library 2025-10-09 09:23:56
aynlp 0.1.3 AYNLP: A lightweight NLP toolkit built by Ankit and Yash for tokenization, stemming, lemmatization, and more. Visit https://github.com/aijadugar/AYNLP to explore the project. 2025-10-07 06:43:56
tamil-utils 0.4.0 Tiny Tamil text utilities: normalize, tokenize, stopwords, graphemes, n-grams, syllables, Tamil collation; dataset preprocessor; optional spaCy tokenizer hook. 2025-09-17 14:24:34
wizardspell 1.0.0 Dictionary-based spell checking with Unicode-aware tokenization and light normalization. Supports 62 languages via compressed Marisa-Trie dictionaries and returns a compact report of misspellings. 2025-08-28 15:46:47
nupunkt-rs 0.1.1 High-performance Rust implementation of nupunkt sentence/paragraph tokenization 2025-08-16 02:05:56
tokker 0.3.9 Tokker: a fast local-first CLI tokenizer with all the best models in one place 2025-08-09 17:59:33
chunkipy 1.0.0.post1 Chunkipy is an easy-to-use library for chunking text based on the size estimator function you provide. 2025-08-08 12:37:03
ultranlp 1.0.6 Ultra-fast, comprehensive NLP preprocessing library with advanced tokenization 2025-08-02 10:21:43
sakurs 0.1.1 Fast, parallel sentence boundary detection using Delta-Stack Monoid algorithm 2025-07-27 15:33:39
llmvision 0.1.1 Visualize how LLMs tokenize text 2025-07-26 07:34:54
miditok 3.0.6.post1 MIDI / symbolic music tokenizers for Deep Learning models. 2025-07-22 12:41:12
rs-bpe 0.1.0 A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust 2025-03-19 05:58:24
nlpashto 0.0.25 Pashto Natural Language Processing Toolkit 2025-02-01 11:17:44
code-tokenize 0.2.1 Fast program tokenization and structural analysis in Python 2025-01-14 09:17:25
rftokenizer 2.3.0 A character-wise tokenizer for morphologically rich languages 2024-12-17 19:05:30
alphacodings 0.2.0 base26 ([A-Z]) and base52 ([A-Za-z]) encodings 2024-12-09 03:04:43
hourdayweektotal
5515758100334194
Elapsed time: 7.15850s